Overview

Dataset statistics

Number of variables18
Number of observations4600
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory647.0 KiB
Average record size in memory144.0 B

Variable types

Categorical8
Numeric10

Alerts

country has constant value "USA" Constant
date has a high cardinality: 70 distinct values High cardinality
street has a high cardinality: 4525 distinct values High cardinality
statezip has a high cardinality: 77 distinct values High cardinality
price is highly correlated with sqft_living and 1 other fieldsHigh correlation
bedrooms is highly correlated with bathrooms and 1 other fieldsHigh correlation
bathrooms is highly correlated with bedrooms and 4 other fieldsHigh correlation
sqft_living is highly correlated with price and 3 other fieldsHigh correlation
sqft_lot is highly correlated with countryHigh correlation
floors is highly correlated with yr_built and 1 other fieldsHigh correlation
sqft_above is highly correlated with price and 5 other fieldsHigh correlation
sqft_basement is highly correlated with bathrooms and 2 other fieldsHigh correlation
yr_built is highly correlated with bathrooms and 5 other fieldsHigh correlation
yr_renovated is highly correlated with yr_builtHigh correlation
date is highly correlated with countryHigh correlation
waterfront is highly correlated with countryHigh correlation
view is highly correlated with countryHigh correlation
condition is highly correlated with yr_builtHigh correlation
city is highly correlated with yr_built and 1 other fieldsHigh correlation
statezip is highly correlated with floors and 3 other fieldsHigh correlation
country is highly correlated with view and 5 other fieldsHigh correlation
price is highly skewed (γ1 = 24.79093256) Skewed
street is uniformly distributed Uniform
price has 49 (1.1%) zeros Zeros
sqft_basement has 2745 (59.7%) zeros Zeros
yr_renovated has 2735 (59.5%) zeros Zeros

Reproduction

Analysis started2022-11-05 12:56:39.557043
Analysis finished2022-11-05 12:56:51.513058
Duration11.96 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

date
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct70
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
2014-06-23 00:00:00
 
142
2014-06-25 00:00:00
 
131
2014-06-26 00:00:00
 
131
2014-07-08 00:00:00
 
127
2014-07-09 00:00:00
 
121
Other values (65)
3948 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters87400
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2014-05-02 00:00:00
2nd row2014-05-02 00:00:00
3rd row2014-05-02 00:00:00
4th row2014-05-02 00:00:00
5th row2014-05-02 00:00:00

Common Values

ValueCountFrequency (%)
2014-06-23 00:00:00142
 
3.1%
2014-06-25 00:00:00131
 
2.8%
2014-06-26 00:00:00131
 
2.8%
2014-07-08 00:00:00127
 
2.8%
2014-07-09 00:00:00121
 
2.6%
2014-06-24 00:00:00120
 
2.6%
2014-07-01 00:00:00116
 
2.5%
2014-05-20 00:00:00116
 
2.5%
2014-06-17 00:00:00113
 
2.5%
2014-05-28 00:00:00111
 
2.4%
Other values (60)3372
73.3%

Length

2022-11-05T13:56:51.551035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:004600
50.0%
2014-06-23142
 
1.5%
2014-06-25131
 
1.4%
2014-06-26131
 
1.4%
2014-07-08127
 
1.4%
2014-07-09121
 
1.3%
2014-06-24120
 
1.3%
2014-07-01116
 
1.3%
2014-05-20116
 
1.3%
2014-06-17113
 
1.2%
Other values (61)3483
37.9%

Most occurring characters

ValueCountFrequency (%)
038976
44.6%
-9200
 
10.5%
:9200
 
10.5%
26569
 
7.5%
16236
 
7.1%
44927
 
5.6%
4600
 
5.3%
62641
 
3.0%
52176
 
2.5%
71143
 
1.3%
Other values (3)1732
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number64400
73.7%
Dash Punctuation9200
 
10.5%
Other Punctuation9200
 
10.5%
Space Separator4600
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
038976
60.5%
26569
 
10.2%
16236
 
9.7%
44927
 
7.7%
62641
 
4.1%
52176
 
3.4%
71143
 
1.8%
3721
 
1.1%
9563
 
0.9%
8448
 
0.7%
Dash Punctuation
ValueCountFrequency (%)
-9200
100.0%
Other Punctuation
ValueCountFrequency (%)
:9200
100.0%
Space Separator
ValueCountFrequency (%)
4600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common87400
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
038976
44.6%
-9200
 
10.5%
:9200
 
10.5%
26569
 
7.5%
16236
 
7.1%
44927
 
5.6%
4600
 
5.3%
62641
 
3.0%
52176
 
2.5%
71143
 
1.3%
Other values (3)1732
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII87400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
038976
44.6%
-9200
 
10.5%
:9200
 
10.5%
26569
 
7.5%
16236
 
7.1%
44927
 
5.6%
4600
 
5.3%
62641
 
3.0%
52176
 
2.5%
71143
 
1.3%
Other values (3)1732
 
2.0%

price
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1741
Distinct (%)37.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean551962.9885
Minimum0
Maximum26590000
Zeros49
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:51.772058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile200000
Q1322875
median460943.4615
Q3654962.5
95-th percentile1184050
Maximum26590000
Range26590000
Interquartile range (IQR)332087.5

Descriptive statistics

Standard deviation563834.7025
Coefficient of variation (CV)1.021508171
Kurtosis1044.352151
Mean551962.9885
Median Absolute Deviation (MAD)157500
Skewness24.79093256
Sum2539029747
Variance3.179095718 × 1011
MonotonicityNot monotonic
2022-11-05T13:56:51.881050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
049
 
1.1%
30000042
 
0.9%
40000031
 
0.7%
44000029
 
0.6%
45000029
 
0.6%
60000029
 
0.6%
35000028
 
0.6%
25000027
 
0.6%
43500027
 
0.6%
41500027
 
0.6%
Other values (1731)4282
93.1%
ValueCountFrequency (%)
049
1.1%
78001
 
< 0.1%
800001
 
< 0.1%
830001
 
< 0.1%
833002
 
< 0.1%
843501
 
< 0.1%
875001
 
< 0.1%
900002
 
< 0.1%
1000004
 
0.1%
1025001
 
< 0.1%
ValueCountFrequency (%)
265900001
< 0.1%
128990001
< 0.1%
70625001
< 0.1%
46680001
< 0.1%
44890001
< 0.1%
38000001
< 0.1%
37100001
< 0.1%
32000001
< 0.1%
31000001
< 0.1%
30000001
< 0.1%

bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.400869565
Minimum0
Maximum9
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:51.971076image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q13
median3
Q34
95-th percentile5
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9088481155
Coefficient of variation (CV)0.2672399215
Kurtosis1.235377429
Mean3.400869565
Median Absolute Deviation (MAD)1
Skewness0.456446633
Sum15644
Variance0.8260048971
MonotonicityNot monotonic
2022-11-05T13:56:52.042058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
32032
44.2%
41531
33.3%
2566
 
12.3%
5353
 
7.7%
661
 
1.3%
138
 
0.8%
714
 
0.3%
82
 
< 0.1%
02
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
02
 
< 0.1%
138
 
0.8%
2566
 
12.3%
32032
44.2%
41531
33.3%
5353
 
7.7%
661
 
1.3%
714
 
0.3%
82
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
82
 
< 0.1%
714
 
0.3%
661
 
1.3%
5353
 
7.7%
41531
33.3%
32032
44.2%
2566
 
12.3%
138
 
0.8%
02
 
< 0.1%

bathrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct26
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.160815217
Minimum0
Maximum8
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:52.121058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11.75
median2.25
Q32.5
95-th percentile3.5
Maximum8
Range8
Interquartile range (IQR)0.75

Descriptive statistics

Standard deviation0.7837810747
Coefficient of variation (CV)0.3627247107
Kurtosis1.86590471
Mean2.160815217
Median Absolute Deviation (MAD)0.5
Skewness0.6160327234
Sum9939.75
Variance0.614312773
MonotonicityNot monotonic
2022-11-05T13:56:52.207058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
2.51189
25.8%
1743
16.2%
1.75629
13.7%
2427
 
9.3%
2.25419
 
9.1%
1.5291
 
6.3%
2.75276
 
6.0%
3167
 
3.6%
3.5162
 
3.5%
3.25136
 
3.0%
Other values (16)161
 
3.5%
ValueCountFrequency (%)
02
 
< 0.1%
0.7517
 
0.4%
1743
16.2%
1.253
 
0.1%
1.5291
 
6.3%
1.75629
13.7%
2427
 
9.3%
2.25419
 
9.1%
2.51189
25.8%
2.75276
 
6.0%
ValueCountFrequency (%)
81
 
< 0.1%
6.751
 
< 0.1%
6.51
 
< 0.1%
6.252
 
< 0.1%
5.751
 
< 0.1%
5.54
 
0.1%
5.254
 
0.1%
56
 
0.1%
4.757
 
0.2%
4.529
0.6%

sqft_living
Real number (ℝ≥0)

HIGH CORRELATION

Distinct566
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2139.346957
Minimum370
Maximum13540
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:52.308058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum370
5-th percentile950
Q11460
median1980
Q32620
95-th percentile3870
Maximum13540
Range13170
Interquartile range (IQR)1160

Descriptive statistics

Standard deviation963.2069158
Coefficient of variation (CV)0.4502340833
Kurtosis8.2916826
Mean2139.346957
Median Absolute Deviation (MAD)570
Skewness1.723513271
Sum9840996
Variance927767.5626
MonotonicityNot monotonic
2022-11-05T13:56:52.419751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
194032
 
0.7%
172032
 
0.7%
166031
 
0.7%
184031
 
0.7%
200030
 
0.7%
141029
 
0.6%
120028
 
0.6%
148028
 
0.6%
170027
 
0.6%
149027
 
0.6%
Other values (556)4305
93.6%
ValueCountFrequency (%)
3701
< 0.1%
3801
< 0.1%
4201
< 0.1%
4301
< 0.1%
4901
< 0.1%
5201
< 0.1%
5501
< 0.1%
5601
< 0.1%
5801
< 0.1%
5902
< 0.1%
ValueCountFrequency (%)
135401
< 0.1%
100401
< 0.1%
96401
< 0.1%
86701
< 0.1%
80201
< 0.1%
73201
< 0.1%
72701
< 0.1%
70501
< 0.1%
69801
< 0.1%
69001
< 0.1%

sqft_lot
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3113
Distinct (%)67.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14852.51609
Minimum638
Maximum1074218
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:52.537750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum638
5-th percentile1690.8
Q15000.75
median7683
Q311001.25
95-th percentile43560
Maximum1074218
Range1073580
Interquartile range (IQR)6000.5

Descriptive statistics

Standard deviation35884.43614
Coefficient of variation (CV)2.416050987
Kurtosis219.8729874
Mean14852.51609
Median Absolute Deviation (MAD)2772
Skewness11.30713875
Sum68321574
Variance1287692757
MonotonicityNot monotonic
2022-11-05T13:56:52.649754image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500080
 
1.7%
600065
 
1.4%
400054
 
1.2%
720050
 
1.1%
480029
 
0.6%
450025
 
0.5%
960025
 
0.5%
300023
 
0.5%
550023
 
0.5%
750023
 
0.5%
Other values (3103)4203
91.4%
ValueCountFrequency (%)
6381
< 0.1%
6811
< 0.1%
7041
< 0.1%
7461
< 0.1%
7471
< 0.1%
7501
< 0.1%
7791
< 0.1%
8331
< 0.1%
8351
< 0.1%
8442
< 0.1%
ValueCountFrequency (%)
10742181
< 0.1%
6412031
< 0.1%
4782881
< 0.1%
4356002
< 0.1%
4238381
< 0.1%
3891261
< 0.1%
3271351
< 0.1%
3077521
< 0.1%
3068481
< 0.1%
2840111
< 0.1%

floors
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.512065217
Minimum1
Maximum3.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:52.739749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1.5
Q32
95-th percentile2
Maximum3.5
Range2.5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5382883773
Coefficient of variation (CV)0.3559954763
Kurtosis-0.5388519795
Mean1.512065217
Median Absolute Deviation (MAD)0.5
Skewness0.5514406463
Sum6955.5
Variance0.2897543771
MonotonicityNot monotonic
2022-11-05T13:56:52.810749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
12174
47.3%
21811
39.4%
1.5444
 
9.7%
3128
 
2.8%
2.541
 
0.9%
3.52
 
< 0.1%
ValueCountFrequency (%)
12174
47.3%
1.5444
 
9.7%
21811
39.4%
2.541
 
0.9%
3128
 
2.8%
3.52
 
< 0.1%
ValueCountFrequency (%)
3.52
 
< 0.1%
3128
 
2.8%
2.541
 
0.9%
21811
39.4%
1.5444
 
9.7%
12174
47.3%

waterfront
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
0
4567 
1
 
33

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Length

2022-11-05T13:56:52.887746image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-05T13:56:52.962755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring characters

ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

view
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
0
4140 
2
 
205
3
 
116
4
 
70
1
 
69

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row4
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Length

2022-11-05T13:56:53.027753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-05T13:56:53.110751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring characters

ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

condition
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
3
2875 
4
1252 
5
435 
2
 
32
1
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row5
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Length

2022-11-05T13:56:53.178749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-05T13:56:53.257751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring characters

ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

sqft_above
Real number (ℝ≥0)

HIGH CORRELATION

Distinct511
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1827.265435
Minimum370
Maximum9410
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:53.350752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum370
5-th percentile860
Q11190
median1590
Q32300
95-th percentile3440
Maximum9410
Range9040
Interquartile range (IQR)1110

Descriptive statistics

Standard deviation862.168977
Coefficient of variation (CV)0.4718356515
Kurtosis4.070138265
Mean1827.265435
Median Absolute Deviation (MAD)490
Skewness1.494210748
Sum8405421
Variance743335.3448
MonotonicityNot monotonic
2022-11-05T13:56:53.450750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120047
 
1.0%
101047
 
1.0%
130045
 
1.0%
114044
 
1.0%
132043
 
0.9%
115042
 
0.9%
109040
 
0.9%
118040
 
0.9%
140038
 
0.8%
105037
 
0.8%
Other values (501)4177
90.8%
ValueCountFrequency (%)
3701
 
< 0.1%
3801
 
< 0.1%
4201
 
< 0.1%
4301
 
< 0.1%
4901
 
< 0.1%
5201
 
< 0.1%
5503
0.1%
5601
 
< 0.1%
5801
 
< 0.1%
5902
< 0.1%
ValueCountFrequency (%)
94101
< 0.1%
80201
< 0.1%
76801
< 0.1%
73201
< 0.1%
66401
< 0.1%
64301
< 0.1%
64201
< 0.1%
61201
< 0.1%
60701
< 0.1%
60501
< 0.1%

sqft_basement
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct207
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean312.0815217
Minimum0
Maximum4820
Zeros2745
Zeros (%)59.7%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:53.553750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3610
95-th percentile1210
Maximum4820
Range4820
Interquartile range (IQR)610

Descriptive statistics

Standard deviation464.1372281
Coefficient of variation (CV)1.487230726
Kurtosis4.082380024
Mean312.0815217
Median Absolute Deviation (MAD)0
Skewness1.642732192
Sum1435575
Variance215423.3665
MonotonicityNot monotonic
2022-11-05T13:56:53.792751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02745
59.7%
50053
 
1.2%
60045
 
1.0%
80043
 
0.9%
90041
 
0.9%
70038
 
0.8%
100033
 
0.7%
40033
 
0.7%
55027
 
0.6%
75026
 
0.6%
Other values (197)1516
33.0%
ValueCountFrequency (%)
02745
59.7%
201
 
< 0.1%
501
 
< 0.1%
602
 
< 0.1%
651
 
< 0.1%
701
 
< 0.1%
803
 
0.1%
902
 
< 0.1%
10014
 
0.3%
1102
 
< 0.1%
ValueCountFrequency (%)
48201
< 0.1%
41301
< 0.1%
28501
< 0.1%
27301
< 0.1%
25502
< 0.1%
23601
< 0.1%
23301
< 0.1%
23001
< 0.1%
22001
< 0.1%
21801
< 0.1%

yr_built
Real number (ℝ≥0)

HIGH CORRELATION

Distinct115
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1970.786304
Minimum1900
Maximum2014
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:53.896753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1913
Q11951
median1976
Q31997
95-th percentile2009
Maximum2014
Range114
Interquartile range (IQR)46

Descriptive statistics

Standard deviation29.73184839
Coefficient of variation (CV)0.0150862873
Kurtosis-0.6700759004
Mean1970.786304
Median Absolute Deviation (MAD)23
Skewness-0.50215519
Sum9065617
Variance883.9828087
MonotonicityNot monotonic
2022-11-05T13:56:54.001755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2006111
 
2.4%
2005104
 
2.3%
200793
 
2.0%
200492
 
2.0%
197890
 
2.0%
200389
 
1.9%
200889
 
1.9%
196782
 
1.8%
197780
 
1.7%
201478
 
1.7%
Other values (105)3692
80.3%
ValueCountFrequency (%)
190022
0.5%
19019
 
0.2%
190210
 
0.2%
190310
 
0.2%
19049
 
0.2%
190519
0.4%
190627
0.6%
190712
0.3%
190819
0.4%
190922
0.5%
ValueCountFrequency (%)
201478
1.7%
201357
1.2%
201233
 
0.7%
201124
 
0.5%
201028
 
0.6%
200950
1.1%
200889
1.9%
200793
2.0%
2006111
2.4%
2005104
2.3%

yr_renovated
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct60
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean808.6082609
Minimum0
Maximum2014
Zeros2735
Zeros (%)59.5%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-05T13:56:54.111749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31999
95-th percentile2011
Maximum2014
Range2014
Interquartile range (IQR)1999

Descriptive statistics

Standard deviation979.4145364
Coefficient of variation (CV)1.211234888
Kurtosis-1.851110913
Mean808.6082609
Median Absolute Deviation (MAD)0
Skewness0.3859187009
Sum3719598
Variance959252.8341
MonotonicityNot monotonic
2022-11-05T13:56:54.219749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02735
59.5%
2000170
 
3.7%
2003151
 
3.3%
2009109
 
2.4%
2001109
 
2.4%
200595
 
2.1%
200477
 
1.7%
201472
 
1.6%
200668
 
1.5%
201361
 
1.3%
Other values (50)953
 
20.7%
ValueCountFrequency (%)
02735
59.5%
191233
 
0.7%
19131
 
< 0.1%
192357
 
1.2%
19346
 
0.1%
19457
 
0.2%
19481
 
< 0.1%
19531
 
< 0.1%
19548
 
0.2%
19552
 
< 0.1%
ValueCountFrequency (%)
201472
1.6%
201361
1.3%
201245
1.0%
201154
1.2%
201030
 
0.7%
2009109
2.4%
200845
1.0%
20077
 
0.2%
200668
1.5%
200595
2.1%

street
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4525
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
2520 Mulberry Walk NE
 
4
2500 Mulberry Walk NE
 
3
9413 34th Ave SW
 
2
6008 8th Ave NE
 
2
11034 NE 26th Pl
 
2
Other values (4520)
4587 

Length

Max length46
Median length40
Mean length17.01826087
Min length8

Characters and Unicode

Total characters78284
Distinct characters62
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4453 ?
Unique (%)96.8%

Sample

1st row18810 Densmore Ave N
2nd row709 W Blaine St
3rd row26206-26214 143rd Ave SE
4th row857 170th Pl NE
5th row9105 170th Ave NE

Common Values

ValueCountFrequency (%)
2520 Mulberry Walk NE4
 
0.1%
2500 Mulberry Walk NE3
 
0.1%
9413 34th Ave SW2
 
< 0.1%
6008 8th Ave NE2
 
< 0.1%
11034 NE 26th Pl2
 
< 0.1%
14583 NE 58th St2
 
< 0.1%
8430 8th Ave SW2
 
< 0.1%
5010 Greenwood Ave N2
 
< 0.1%
22840 SE 269th St2
 
< 0.1%
3510 S Holly St2
 
< 0.1%
Other values (4515)4577
99.5%

Length

2022-11-05T13:56:54.339750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ave1940
 
10.5%
ne1358
 
7.4%
se1180
 
6.4%
st1171
 
6.3%
pl807
 
4.4%
s562
 
3.0%
sw513
 
2.8%
n292
 
1.6%
nw288
 
1.6%
ct173
 
0.9%
Other values (4805)10157
55.1%

Most occurring characters

ValueCountFrequency (%)
13841
17.7%
15952
 
7.6%
t4628
 
5.9%
24618
 
5.9%
S3522
 
4.5%
03119
 
4.0%
33090
 
3.9%
e2891
 
3.7%
h2787
 
3.6%
42775
 
3.5%
Other values (52)31061
39.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29557
37.8%
Lowercase Letter20844
26.6%
Space Separator13841
17.7%
Uppercase Letter13697
17.5%
Dash Punctuation343
 
0.4%
Other Punctuation2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t4628
22.2%
e2891
13.9%
h2787
13.4%
v2044
9.8%
l1311
 
6.3%
r1115
 
5.3%
n1066
 
5.1%
d1000
 
4.8%
a907
 
4.4%
s589
 
2.8%
Other values (15)2506
12.0%
Uppercase Letter
ValueCountFrequency (%)
S3522
25.7%
E2718
19.8%
A2018
14.7%
N1966
14.4%
W1183
 
8.6%
P895
 
6.5%
C283
 
2.1%
D184
 
1.3%
L150
 
1.1%
M148
 
1.1%
Other values (14)630
 
4.6%
Decimal Number
ValueCountFrequency (%)
15952
20.1%
24618
15.6%
03119
10.6%
33090
10.5%
42775
9.4%
52409
8.2%
62060
 
7.0%
71928
 
6.5%
81849
 
6.3%
91757
 
5.9%
Space Separator
ValueCountFrequency (%)
13841
100.0%
Dash Punctuation
ValueCountFrequency (%)
-343
100.0%
Other Punctuation
ValueCountFrequency (%)
/2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common43743
55.9%
Latin34541
44.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t4628
13.4%
S3522
 
10.2%
e2891
 
8.4%
h2787
 
8.1%
E2718
 
7.9%
v2044
 
5.9%
A2018
 
5.8%
N1966
 
5.7%
l1311
 
3.8%
W1183
 
3.4%
Other values (39)9473
27.4%
Common
ValueCountFrequency (%)
13841
31.6%
15952
13.6%
24618
 
10.6%
03119
 
7.1%
33090
 
7.1%
42775
 
6.3%
52409
 
5.5%
62060
 
4.7%
71928
 
4.4%
81849
 
4.2%
Other values (3)2102
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII78284
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13841
17.7%
15952
 
7.6%
t4628
 
5.9%
24618
 
5.9%
S3522
 
4.5%
03119
 
4.0%
33090
 
3.9%
e2891
 
3.7%
h2787
 
3.6%
42775
 
3.5%
Other values (52)31061
39.7%

city
Categorical

HIGH CORRELATION

Distinct44
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
Seattle
1573 
Renton
293 
Bellevue
286 
Redmond
235 
Issaquah
 
187
Other values (39)
2026 

Length

Max length19
Median length18
Mean length7.753913043
Min length4

Characters and Unicode

Total characters35668
Distinct characters45
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowShoreline
2nd rowSeattle
3rd rowKent
4th rowBellevue
5th rowRedmond

Common Values

ValueCountFrequency (%)
Seattle1573
34.2%
Renton293
 
6.4%
Bellevue286
 
6.2%
Redmond235
 
5.1%
Issaquah187
 
4.1%
Kirkland187
 
4.1%
Kent185
 
4.0%
Auburn176
 
3.8%
Sammamish175
 
3.8%
Federal Way148
 
3.2%
Other values (34)1155
25.1%

Length

2022-11-05T13:56:54.444749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
seattle1573
30.4%
renton293
 
5.7%
bellevue286
 
5.5%
redmond235
 
4.5%
issaquah187
 
3.6%
kirkland187
 
3.6%
kent185
 
3.6%
auburn176
 
3.4%
sammamish175
 
3.4%
federal148
 
2.9%
Other values (47)1722
33.3%

Most occurring characters

ValueCountFrequency (%)
e6423
18.0%
t3861
10.8%
l3602
 
10.1%
a3573
 
10.0%
n2261
 
6.3%
S1975
 
5.5%
o1382
 
3.9%
r1137
 
3.2%
d1113
 
3.1%
i1079
 
3.0%
Other values (35)9262
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter29903
83.8%
Uppercase Letter5197
 
14.6%
Space Separator567
 
1.6%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6423
21.5%
t3861
12.9%
l3602
12.0%
a3573
11.9%
n2261
 
7.6%
o1382
 
4.6%
r1137
 
3.8%
d1113
 
3.7%
i1079
 
3.6%
u1071
 
3.6%
Other values (14)4401
14.7%
Uppercase Letter
ValueCountFrequency (%)
S1975
38.0%
R535
 
10.3%
B453
 
8.7%
K438
 
8.4%
I274
 
5.3%
W263
 
5.1%
M253
 
4.9%
F196
 
3.8%
A182
 
3.5%
V126
 
2.4%
Other values (9)502
 
9.7%
Space Separator
ValueCountFrequency (%)
567
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin35100
98.4%
Common568
 
1.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6423
18.3%
t3861
11.0%
l3602
10.3%
a3573
10.2%
n2261
 
6.4%
S1975
 
5.6%
o1382
 
3.9%
r1137
 
3.2%
d1113
 
3.2%
i1079
 
3.1%
Other values (33)8694
24.8%
Common
ValueCountFrequency (%)
567
99.8%
-1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII35668
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6423
18.0%
t3861
10.8%
l3602
 
10.1%
a3573
 
10.0%
n2261
 
6.3%
S1975
 
5.5%
o1382
 
3.9%
r1137
 
3.2%
d1113
 
3.1%
i1079
 
3.0%
Other values (35)9262
26.0%

statezip
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct77
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
WA 98103
 
148
WA 98052
 
135
WA 98117
 
132
WA 98115
 
130
WA 98006
 
110
Other values (72)
3945 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters36800
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowWA 98133
2nd rowWA 98119
3rd rowWA 98042
4th rowWA 98008
5th rowWA 98052

Common Values

ValueCountFrequency (%)
WA 98103148
 
3.2%
WA 98052135
 
2.9%
WA 98117132
 
2.9%
WA 98115130
 
2.8%
WA 98006110
 
2.4%
WA 98059106
 
2.3%
WA 98042100
 
2.2%
WA 9803499
 
2.2%
WA 9805398
 
2.1%
WA 9807498
 
2.1%
Other values (67)3444
74.9%

Length

2022-11-05T13:56:54.532750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
wa4600
50.0%
98103148
 
1.6%
98052135
 
1.5%
98117132
 
1.4%
98115130
 
1.4%
98006110
 
1.2%
98059106
 
1.2%
98042100
 
1.1%
9803499
 
1.1%
9807498
 
1.1%
Other values (68)3542
38.5%

Most occurring characters

ValueCountFrequency (%)
85274
14.3%
95201
14.1%
W4600
12.5%
A4600
12.5%
4600
12.5%
03695
10.0%
12741
7.4%
51275
 
3.5%
21244
 
3.4%
31146
 
3.1%
Other values (3)2424
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number23000
62.5%
Uppercase Letter9200
 
25.0%
Space Separator4600
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
85274
22.9%
95201
22.6%
03695
16.1%
12741
11.9%
51275
 
5.5%
21244
 
5.4%
31146
 
5.0%
7876
 
3.8%
4778
 
3.4%
6770
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
W4600
50.0%
A4600
50.0%
Space Separator
ValueCountFrequency (%)
4600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common27600
75.0%
Latin9200
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
85274
19.1%
95201
18.8%
4600
16.7%
03695
13.4%
12741
9.9%
51275
 
4.6%
21244
 
4.5%
31146
 
4.2%
7876
 
3.2%
4778
 
2.8%
Latin
ValueCountFrequency (%)
W4600
50.0%
A4600
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII36800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
85274
14.3%
95201
14.1%
W4600
12.5%
A4600
12.5%
4600
12.5%
03695
10.0%
12741
7.4%
51275
 
3.5%
21244
 
3.4%
31146
 
3.1%
Other values (3)2424
6.6%

country
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
USA
4600 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters13800
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUSA
2nd rowUSA
3rd rowUSA
4th rowUSA
5th rowUSA

Common Values

ValueCountFrequency (%)
USA4600
100.0%

Length

2022-11-05T13:56:54.611753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-05T13:56:54.688751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
usa4600
100.0%

Most occurring characters

ValueCountFrequency (%)
U4600
33.3%
S4600
33.3%
A4600
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter13800
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U4600
33.3%
S4600
33.3%
A4600
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin13800
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U4600
33.3%
S4600
33.3%
A4600
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII13800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U4600
33.3%
S4600
33.3%
A4600
33.3%

Interactions

2022-11-05T13:56:50.202035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:41.640035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.584031image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.494058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.544029image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.510035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.396035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.416036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.292058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.311058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.295058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:41.739035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.678047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.587035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.640058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.598037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.619035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.501034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.381060image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.403035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.384047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:41.827047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.767043image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.672058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.726058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.682038image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.702036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.586033image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.463058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.486035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.482051image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:41.920036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.868043image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.772049image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.829058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.777059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.796058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.677047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.558040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.579047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.578058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.017041image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.964049image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.994058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.931052image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.867036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.887047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.769047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.649035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.672035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.672047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.109058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.058058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.085037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.038030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.957058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.976035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.858058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.736048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.761047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.763037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.206047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.143058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.175059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.133032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.046058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.062058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.944058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.824058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.851058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.850048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.292058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.229047image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.263035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.229035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.127058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.145037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.025058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.910058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.935040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.943058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.386039image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.314058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.353027image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.319050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.215050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.232037image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.112058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.997040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.022058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:51.039058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:42.480030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:43.401029image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:44.444058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:45.411058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:46.300035image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:47.322058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:48.198058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:49.218036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-05T13:56:50.108058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-05T13:56:54.754748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-05T13:56:54.918751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-05T13:56:55.074747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-05T13:56:55.227748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-05T13:56:55.375748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-05T13:56:55.498751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-05T13:56:51.194169image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-05T13:56:51.428058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

datepricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditionsqft_abovesqft_basementyr_builtyr_renovatedstreetcitystatezipcountry
02014-05-02 00:00:00313000.03.01.50134079121.5003134001955200518810 Densmore Ave NShorelineWA 98133USA
12014-05-02 00:00:002384000.05.02.50365090502.0045337028019210709 W Blaine StSeattleWA 98119USA
22014-05-02 00:00:00342000.03.02.001930119471.0004193001966026206-26214 143rd Ave SEKentWA 98042USA
32014-05-02 00:00:00420000.03.02.25200080301.00041000100019630857 170th Pl NEBellevueWA 98008USA
42014-05-02 00:00:00550000.04.02.501940105001.00041140800197619929105 170th Ave NERedmondWA 98052USA
52014-05-02 00:00:00490000.02.01.0088063801.0003880019381994522 NE 88th StSeattleWA 98115USA
62014-05-02 00:00:00335000.02.02.00135025601.000313500197602616 174th Ave NERedmondWA 98052USA
72014-05-02 00:00:00482000.04.02.502710358682.0003271001989023762 SE 253rd PlMaple ValleyWA 98038USA
82014-05-02 00:00:00452500.03.02.502430884261.000415708601985046611-46625 SE 129th StNorth BendWA 98045USA
92014-05-02 00:00:00640000.04.02.00152062001.500315200194520106811 55th Ave NESeattleWA 98115USA

Last rows

datepricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditionsqft_abovesqft_basementyr_builtyr_renovatedstreetcitystatezipcountry
45902014-07-08 00:00:00380680.5555564.02.50262083312.0003262001991013602 SE 186th PlRentonWA 98058USA
45912014-07-08 00:00:00396166.6666673.01.75188057521.0004940940194503529 SW Webster StSeattleWA 98126USA
45922014-07-08 00:00:00252980.0000004.02.50253081692.0003253001993037654 18th Pl SFederal WayWA 98003USA
45932014-07-08 00:00:00289373.3076923.02.50253846002.000325380201319235703 Charlotte Ave SEAuburnWA 98092USA
45942014-07-09 00:00:00210614.2857143.02.50161072232.0003161001994026306 127th Ave SEKentWA 98030USA
45952014-07-09 00:00:00308166.6666673.01.75151063601.00041510019541979501 N 143rd StSeattleWA 98133USA
45962014-07-09 00:00:00534333.3333333.02.50146075732.0003146001983200914855 SE 10th PlBellevueWA 98007USA
45972014-07-09 00:00:00416904.1666673.02.50301070142.00033010020090759 Ilwaco Pl NERentonWA 98059USA
45982014-07-10 00:00:00203400.0000004.02.00209066301.000310701020197405148 S Creston StSeattleWA 98178USA
45992014-07-10 00:00:00220600.0000003.02.50149081022.0004149001990018717 SE 258th StCovingtonWA 98042USA